79 research outputs found

    Gene Interaction Network Suggests Dioxin Induces a Significant Linkage between Aryl Hydrocarbon Receptor and Retinoic Acid Receptor Beta

    Get PDF
    Gene expression arrays (gene chips) have enabled researchers to roughly quantify the level of mRNA expression for a large number of genes in a single sample. Several methods have been developed for the analysis of gene array data including clustering, outlier detection, and correlation studies. Most of these analyses are aimed at a qualitative identification of what is different between two samples and/or the relationship between two genes. We propose a quantitative, statistically sound methodology for the analysis of gene regulatory networks using gene expression data sets. The method is based on Bayesian networks for direct quantification of gene expression networks. Using the gene expression changes in HPL1A lung airway epithelial cells after exposure to 2,3,7,8-tetrachlorodibenzo-p-dioxin at levels of 0.1, 1.0, and 10.0 nM for 24 hr, a gene expression network was hypothesized and analyzed. The method clearly demonstrates support for the assumed network and the hypothesis linking the usual dioxin expression changes to the retinoic acid receptor system. Simulation studies demonstrated the method works well, even for small samples

    Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear Gaussian networks

    Get PDF
    BACKGROUND. Reverse engineering cellular networks is currently one of the most challenging problems in systems biology. Dynamic Bayesian networks (DBNs) seem to be particularly suitable for inferring relationships between cellular variables from the analysis of time series measurements of mRNA or protein concentrations. As evaluating inference results on a real dataset is controversial, the use of simulated data has been proposed. However, DBN approaches that use continuous variables, thus avoiding the information loss associated with discretization, have not yet been extensively assessed, and most of the proposed approaches have dealt with linear Gaussian models. RESULTS. We propose a generalization of dynamic Gaussian networks to accommodate nonlinear dependencies between variables. As a benchmark dataset to test the new approach, we used data from a mathematical model of cell cycle control in budding yeast that realistically reproduces the complexity of a cellular system. We evaluated the ability of the networks to describe the dynamics of cellular systems and their precision in reconstructing the true underlying causal relationships between variables. We also tested the robustness of the results by analyzing the effect of noise on the data, and the impact of a different sampling time. CONCLUSION. The results confirmed that DBNs with Gaussian models can be effectively exploited for a first level analysis of data from complex cellular systems. The inferred models are parsimonious and have a satisfying goodness of fit. Furthermore, the networks not only offer a phenomenological description of the dynamics of cellular systems, but are also able to suggest hypotheses concerning the causal interactions between variables. The proposed nonlinear generalization of Gaussian models yielded models characterized by a slightly lower goodness of fit than the linear model, but a better ability to recover the true underlying connections between variables.Italian Ministry of University and Scientific Research; National Institutes of Health & National Human Genome Research Institute (HG003354-01A2); Collegio Ghislieri, Pavia Italy fellowshi

    A Bayesian Network Driven Approach to Model the Transcriptional Response to Nitric Oxide in Saccharomyces cerevisiae

    Get PDF
    The transcriptional response to exogenously supplied nitric oxide in Saccharomyces cerevisiae was modeled using an integrated framework of Bayesian network learning and experimental feedback. A Bayesian network learning algorithm was used to generate network models of transcriptional output, followed by model verification and revision through experimentation. Using this framework, we generated a network model of the yeast transcriptional response to nitric oxide and a panel of other environmental signals. We discovered two environmental triggers, the diauxic shift and glucose repression, that affected the observed transcriptional profile. The computational method predicted the transcriptional control of yeast flavohemoglobin YHB1 by glucose repression, which was subsequently experimentally verified. A freely available software application, ExpressionNet, was developed to derive Bayesian network models from a combination of gene expression profile clusters, genetic information and experimental conditions

    A classification-based framework for predicting and analyzing gene regulatory response

    Get PDF
    BACKGROUND: We have recently introduced a predictive framework for studying gene transcriptional regulation in simpler organisms using a novel supervised learning algorithm called GeneClass. GeneClass is motivated by the hypothesis that in model organisms such as Saccharomyces cerevisiae, we can learn a decision rule for predicting whether a gene is up- or down-regulated in a particular microarray experiment based on the presence of binding site subsequences ("motifs") in the gene's regulatory region and the expression levels of regulators such as transcription factors in the experiment ("parents"). GeneClass formulates the learning task as a classification problem — predicting +1 and -1 labels corresponding to up- and down-regulation beyond the levels of biological and measurement noise in microarray measurements. Using the Adaboost algorithm, GeneClass learns a prediction function in the form of an alternating decision tree, a margin-based generalization of a decision tree. METHODS: In the current work, we introduce a new, robust version of the GeneClass algorithm that increases stability and computational efficiency, yielding a more scalable and reliable predictive model. The improved stability of the prediction tree enables us to introduce a detailed post-processing framework for biological interpretation, including individual and group target gene analysis to reveal condition-specific regulation programs and to suggest signaling pathways. Robust GeneClass uses a novel stabilized variant of boosting that allows a set of correlated features, rather than single features, to be included at nodes of the tree; in this way, biologically important features that are correlated with the single best feature are retained rather than decorrelated and lost in the next round of boosting. Other computational developments include fast matrix computation of the loss function for all features, allowing scalability to large datasets, and the use of abstaining weak rules, which results in a more shallow and interpretable tree. We also show how to incorporate genome-wide protein-DNA binding data from ChIP chip experiments into the GeneClass algorithm, and we use an improved noise model for gene expression data. RESULTS: Using the improved scalability of Robust GeneClass, we present larger scale experiments on a yeast environmental stress dataset, training and testing on all genes and using a comprehensive set of potential regulators. We demonstrate the improved stability of the features in the learned prediction tree, and we show the utility of the post-processing framework by analyzing two groups of genes in yeast — the protein chaperones and a set of putative targets of the Nrg1 and Nrg2 transcription factors — and suggesting novel hypotheses about their transcriptional and post-transcriptional regulation. Detailed results and Robust GeneClass source code is available for download from

    A Visual Data Mining Tool that Facilitates Reconstruction of Transcription Regulatory Networks

    Get PDF
    Background: Although the use of microarray technology has seen exponential growth, analysis of microarray data remains a challenge to many investigators. One difficulty lies in the interpretation of a list of differentially expressed genes, or in how to plan new experiments given that knowledge. Clustering methods can be used to identify groups of genes with similar expression patterns, and genes with unknown function can be provisionally annotated based on the concept of ‘‘guilt by association’’, where function is tentatively inferred from the known functions of genes with similar expression patterns. These methods frequently suffer from two limitations: (1) visualization usually only gives access to group membership, rather than specific information about nearest neighbors, and (2) the resolution or quality of the relationships are not easily inferred. Methodology/Principal Findings: We have addressed these issues by improving the precision of similarity detection over that of a single experiment and by creating a tool to visualize tractable association networks: we (1) performed metaanalysis computation of correlation coefficients for all gene pairs in a heterogeneous data set collected from 2,145 publicly available micorarray samples in mouse, (2) filtered the resulting distribution of over 130 million correlation coefficients to build new, more tractable distributions from the strongest correlations, and (3) designed and implemented a new Web based tool (StarNet

    Seeded Bayesian Networks: Constructing genetic networks from microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA microarrays and other genomics-inspired technologies provide large datasets that often include hidden patterns of correlation between genes reflecting the complex processes that underlie cellular metabolism and physiology. The challenge in analyzing large-scale expression data has been to extract biologically meaningful inferences regarding these processes – often represented as networks – in an environment where the datasets are often imperfect and biological noise can obscure the actual signal. Although many techniques have been developed in an attempt to address these issues, to date their ability to extract meaningful and predictive network relationships has been limited. Here we describe a method that draws on prior information about gene-gene interactions to infer biologically relevant pathways from microarray data. Our approach consists of using preliminary networks derived from the literature and/or protein-protein interaction data as seeds for a Bayesian network analysis of microarray results.</p> <p>Results</p> <p>Through a bootstrap analysis of gene expression data derived from a number of leukemia studies, we demonstrate that seeded Bayesian Networks have the ability to identify high-confidence gene-gene interactions which can then be validated by comparison to other sources of pathway data.</p> <p>Conclusion</p> <p>The use of network seeds greatly improves the ability of Bayesian Network analysis to learn gene interaction networks from gene expression data. We demonstrate that the use of seeds derived from the biomedical literature or high-throughput protein-protein interaction data, or the combination, provides improvement over a standard Bayesian Network analysis, allowing networks involving dynamic processes to be deduced from the static snapshots of biological systems that represent the most common source of microarray data. Software implementing these methods has been included in the widely used TM4 microarray analysis package.</p

    Temperature Dependence of the Extrinsic Incubation Period of Orbiviruses in Culicoides Biting Midges

    Get PDF
    The rate at which viruses replicate and disseminate in competent arthropod vectors is limited by the temperature of their environment, and this can be an important determinant of geographical and seasonal limits to their transmission by arthropods in temperate regions.Here, we present a novel statistical methodology for estimating the relationship between temperature and the extrinsic incubation period (EIP) and apply it to both published and novel data on virus replication for three internationally important orbiviruses (African horse sickness virus (AHSV), bluetongue virus (BTV) and epizootic haemorrhagic disease virus (EHDV)) in their Culicoides vectors. Our analyses show that there can be differences in vector competence for different orbiviruses in the same vector species and for the same orbivirus in different vector species. Both the rate of virus replication (approximately 0.017-0.021 per degree-day) and the minimum temperature required for replication (11-13°C), however, were generally consistent for different orbiviruses and across different Culicoides vector species. The estimates obtained in the present study suggest that previous publications have underestimated the replication rate and threshold temperature because the statistical methods they used included an implicit assumption that all negative vectors were infected.Robust estimates of the temperature dependence of arbovirus replication are essential for building accurate models of transmission and for informing policy decisions about seasonal relaxations to movement restrictions. The methodology developed in this study provides the required robustness and is superior to methods used previously. Importantly, the methods are generic and can readily be applied to other arbovirus-vector systems, as long as the assumptions described in the text are valid

    Assessing the potential for Bluetongue virus 8 to spread and vaccination strategies in Scotland

    Get PDF
    Europe has seen frequent outbreaks of Bluetongue (BT) disease since 2006, including an outbreak of BT virus serotype 8 in central France during 2015 that has continued to spread in Europe during 2016. Thus, assessing the potential for BTv-8 spread and determining the optimal deployment of vaccination is critical for contingency planning. We developed a spatially explicit mathematical model of BTv-8 spread in Scotland and explored the sensitivity of transmission to key disease spread parameters for which detailed empirical data is lacking. With parameters at mean values, there is little spread of BTv-8 in Scotland. However, under a “worst case” but still feasible scenario with parameters at the limits of their ranges and temperatures 1 °C warmer than the mean, we find extensive spread with 203,000 sheep infected given virus introduction to the south of Scotland between mid-May and mid-June. Strategically targeted vaccine interventions can greatly reduce BT spread. Specifically, despite BT having most clinical impact in sheep, we show that vaccination can have the greatest impact on reducing BTv infections in sheep when administered to cattle, which has implications for disease control policy

    Constructing non-stationary Dynamic Bayesian Networks with a flexible lag choosing mechanism

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Dynamic Bayesian Networks (DBNs) are widely used in regulatory network structure inference with gene expression data. Current methods assumed that the underlying stochastic processes that generate the gene expression data are stationary. The assumption is not realistic in certain applications where the intrinsic regulatory networks are subject to changes for adapting to internal or external stimuli.</p> <p>Results</p> <p>In this paper we investigate a novel non-stationary DBNs method with a potential regulator detection technique and a flexible lag choosing mechanism. We apply the approach for the gene regulatory network inference on three non-stationary time series data. For the Macrophages and Arabidopsis data sets with the reference networks, our method shows better network structure prediction accuracy. For the Drosophila data set, our approach converges faster and shows a better prediction accuracy on transition times. In addition, our reconstructed regulatory networks on the Drosophila data not only share a lot of similarities with the predictions of the work of other researchers but also provide many new structural information for further investigation.</p> <p>Conclusions</p> <p>Compared with recent proposed non-stationary DBNs methods, our approach has better structure prediction accuracy By detecting potential regulators, our method reduces the size of the search space, hence may speed up the convergence of MCMC sampling.</p

    Decelerating Spread of West Nile Virus by Percolation in a Heterogeneous Urban Landscape

    Get PDF
    Vector-borne diseases are emerging and re-emerging in urban environments throughout the world, presenting an increasing challenge to human health and a major obstacle to development. Currently, more than half of the global population is concentrated in urban environments, which are highly heterogeneous in the extent, degree, and distribution of environmental modifications. Because the prevalence of vector-borne pathogens is so closely coupled to the ecologies of vector and host species, this heterogeneity has the potential to significantly alter the dynamical systems through which pathogens propagate, and also thereby affect the epidemiological patterns of disease at multiple spatial scales. One such pattern is the speed of spread. Whereas standard models hold that pathogens spread as waves with constant or increasing speed, we hypothesized that heterogeneity in urban environments would cause decelerating travelling waves in incipient epidemics. To test this hypothesis, we analysed data on the spread of West Nile virus (WNV) in New York City (NYC), the 1999 epicentre of the North American pandemic, during annual epizootics from 2000–2008. These data show evidence of deceleration in all years studied, consistent with our hypothesis. To further explain these patterns, we developed a spatial model for vector-borne disease transmission in a heterogeneous environment. An emergent property of this model is that deceleration occurs only in the vicinity of a critical point. Geostatistical analysis suggests that NYC may be on the edge of this criticality. Together, these analyses provide the first evidence for the endogenous generation of decelerating travelling waves in an emerging infectious disease. Since the reported deceleration results from the heterogeneity of the environment through which the pathogen percolates, our findings suggest that targeting control at key sites could efficiently prevent pathogen spread to remote susceptible areas or even halt epidemics
    corecore